AITopics | task interference

Multimodal large language models (MLLMs) have demonstrated impressive capabilities across various vision-language tasks. However, a generalist MLLM typically underperforms compared with a specialist MLLM on most VL tasks, which can be attributed to task interference. In this paper, we propose a mixture of multimodal experts (MoME) to mitigate task interference and obtain a generalist MLLM. Our MoME is composed of two key components, a mixture of vision experts (MoVE) and a mixture of language experts (MoLE). MoVE can adaptively modulate the features transformed from various vision encoders, and has a strong compatibility in transformation architecture. MoLE incorporates sparsely gated experts into LLMs to achieve painless improvements with roughly unchanged inference costs. In response to task interference, our MoME specializes in both vision and language modality to adapt to task discrepancies. Extensive experiments show that MoME significantly improves the performance of generalist MLLMs across various VL tasks.

artificial intelligence, large language model, natural language, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.64)

Add feedback

e991e5587c1daa49bbf9a818b3f02f9a-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 18:03:50 GMT

artificial intelligence, machine learning, trire, (18 more...)

Neural Information Processing Systems

Country: Europe > Netherlands > North Brabant > Eindhoven (0.04)

Genre: Research Report (0.93)

Industry:

Health & Medicine (0.93)
Education (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

4a3a14b9536806a0522930007c5512f7-Paper-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 12:55:37 GMT

arxiv preprint arxiv, large language model, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)

Add feedback

11fc8c98b46d4cbdfe8157267228f7d7-Paper-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 13:05:54 GMT

arxiv preprint arxiv, conditional moe, generalist model, (14 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models

Neural Information Processing SystemsOct-10-2025, 01:28:52 GMT

Multimodal large language models (MLLMs) have demonstrated impressive capabilities across various vision-language tasks.

arxiv preprint arxiv, task interference, vision encoder, (13 more...)

Neural Information Processing Systems

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

FroM: Frobenius Norm-Based Data-Free Adaptive Model Merging

Li, Zijian, Feng, Xiaocheng, Liu, Huixin, Huang, Yichong, Liu, Ting, Qin, Bing

arXiv.org Artificial IntelligenceSep-18-2025

With the development of large language models, fine-tuning has emerged as an effective method to enhance performance in specific scenarios by injecting domain-specific knowledge. In this context, model merging techniques provide a solution for fusing knowledge from multiple fine-tuning models by combining their parameters. However, traditional methods often encounter task interference when merging full fine-tuning models, and this problem becomes even more evident in parameter-efficient fine-tuning scenarios. In this paper, we introduce an improvement to the RegMean method, which indirectly leverages the training data to approximate the outputs of the linear layers before and after merging. We propose an adaptive merging method called FroM, which directly measures the model parameters using the Frobenius norm, without any training data. By introducing an additional hyperparameter for control, FroM outperforms baseline methods across various fine-tuning scenarios, alleviating the task interference problem.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2506.02478

Genre: Research Report > Promising Solution (0.48)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Tensorized Clustered LoRA Merging for Multi-Task Interference

Su, Zhan, Mo, Fengran, Liang, Guojun, Zhang, Jinghan, Wen, Bingbing, Tiwari, Prayag, Nie, Jian-Yun

arXiv.org Artificial IntelligenceAug-7-2025

Despite the success of the monolithic dense paradigm of large language models (LLMs), the LoRA adapters offer an efficient solution by fine-tuning small task-specific modules and merging them with the base model. However, in multi-task settings, merging LoRA adapters trained on heterogeneous sources frequently causes \textit{task interference}, degrading downstream performance. To address this, we propose a tensorized clustered LoRA (TC-LoRA) library targeting to address the task interference at the \textit{text-level} and \textit{parameter-level}. At the \textit{text-level}, we cluster the training samples in the embedding space to capture input-format similarities, then train a specialized LoRA adapter for each cluster. At the \textit{parameter-level}, we introduce a joint Canonical Polyadic (CP) decomposition that disentangles task-specific and shared factors across LoRA adapters. This joint factorization preserves essential knowledge while reducing cross-task interference. Extensive experiments on out-of-domain zero-shot and skill-composition tasks-including reasoning, question answering, and coding. Compared to strong SVD-based baselines, TC-LoRA achieves +1.4\% accuracy on Phi-3 and +2.3\% on Mistral-7B (+2.3\%), demonstrating the effectiveness of TC-LoRA in LLM adaptation.

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2508.03999

Country:

North America (0.28)
Europe (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models

Neural Information Processing SystemsMay-27-2025, 00:17:41 GMT

Multimodal large language models (MLLMs) have demonstrated impressive capabilities across various vision-language tasks. However, a generalist MLLM typically underperforms compared with a specialist MLLM on most VL tasks, which can be attributed to task interference. In this paper, we propose a mixture of multimodal experts (MoME) to mitigate task interference and obtain a generalist MLLM. Our MoME is composed of two key components, a mixture of vision experts (MoVE) and a mixture of language experts (MoLE). MoVE can adaptively modulate the features transformed from various vision encoders, and has a strong compatibility in transformation architecture.

artificial intelligence, large language model, natural language, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.81)

Add feedback

Filters

Collaborating Authors

task interference

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

e991e5587c1daa49bbf9a818b3f02f9a-Paper-Conference.pdf

11fc8c98b46d4cbdfe8157267228f7d7-Paper-Conference.pdf

MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models

e991e5587c1daa49bbf9a818b3f02f9a-Paper-Conference.pdf

4a3a14b9536806a0522930007c5512f7-Paper-Conference.pdf

11fc8c98b46d4cbdfe8157267228f7d7-Paper-Conference.pdf

MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models

FroM: Frobenius Norm-Based Data-Free Adaptive Model Merging

Tensorized Clustered LoRA Merging for Multi-Task Interference

MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models